Decision Tree & Random Forest

Decision Tree

/assets/images/decision-tree-1.png|500

How it works

  1. choose one feature and find the best split of this feature
  2. repeat Step 1 for other features (brute force)
  3. select the feature with the split that leads to purest labels/max information gain
  4. repeat from root node to leaf nodes till data is pure
Pruning a decision tree

  • Pruning consists of going back through the tree once it’s been created and removing branches that don’t contribute significantly enough to the error reduction by replacing them with leaf nodes.
  • It is a bottom-up technique that solved overfitting.
  • Two approaches:
    • pre-pruning: prune while growing the tree
    • post-pruning: start pruning once the tree is built to its depth

Hyperparameters

Pros & Cons

Pros

Cons

Metrics

Random Forest

Random forest is a Ensemble Learning method as an ensemble of Decision trees, which is necessary, because a single tree is sensitive to small changes of the data.

How it works

  1. pick at random K data points from the training set, with bootstrapping
  2. build the decision tree associated to these K data points. At each node:

If n features are available:

  1. repeat Steps 1&2 according to the number of trees you prefer (usually > 500 tress)
  2. predicted value = average of predicted values across all trees

Hyperparamters

Pros & Cons

Pros

Cons